Search CORE

3 research outputs found

Query-Driven Global Graph Attention Model for Visual Parsing: Recognizing Handwritten and Typeset Math Formulas

Author: Mahdavi Mahshad
Publication venue: RIT Scholar Works
Publication date: 07/08/2020
Field of study

We present a new visual parsing method based on standard Convolutional Neural Networks (CNNs) for handwritten and typeset mathematical formulas. The Query-Driven Global Graph Attention (QD-GGA) parser employs multi-task learning, using a single feature representation for locating, classifying, and relating symbols. QD-GGA parses formulas by first constructing a Line-Of-Sight (LOS) graph over the input primitives (e.g handwritten strokes or connected components in images). Second, class distributions for LOS nodes and edges are obtained using query-specific feature filters (i.e., attention) in a single feed-forward pass. This allows end-to-end structure learning using a joint loss over primitive node and edge class distributions. Finally, a Maximum Spanning Tree (MST) is extracted from the weighted graph using Edmonds\u27 Arborescence Algorithm. The model may be run recurrently over the input graph, updating attention to focus on symbols detected in the previous iteration. QD-GGA does not require additional grammar rules and the language model is learned from the sets of symbols/relationships and the statistics over them in the training set. We benchmark our system against both handwritten and typeset state-of-the-art math recognition systems. Our preliminary results show that this is a promising new approach for visual parsing of math formulas. Using recurrent execution, symbol detection is near perfect for both handwritten and typeset formulas: we obtain a symbol f-measure of over 99.4% for both the CROHME (handwritten) and INFTYMCCDB-2 (typeset formula image) datasets. Our method is also much faster in both training and execution than state-of-the-art RNN-based formula parsers. The unlabeled structure detection of QDGGA is competitive with encoder-decoder models, but QD-GGA symbol and relationship classification is weaker. We believe this may be addressed through increased use of spatial features and global context

RIT Scholar Works

ICDAR 2019 CROHME + TFD: Competition on Recognition of Handwritten Mathematical Expressions and Typeset Formula Detection

Author: Garain Utpal
Mahdavi Mahshad
Mouchère Harold
Viard-Gaudin Christian
Zanibbi Richard
Publication venue: HAL CCSD
Publication date: 20/09/2019
Field of study

International audienceWe summarize the tasks, protocol, and outcome for the 6th Competition on Recognition of Handwritten Mathematical Expressions (CROHME), which includes a new formula detection in document images task (+ TFD). For CROHME + TFD 2019, participants chose between two tasks for recognizing handwritten formulas from 1) online stroke data, or 2) images generated from the handwritten strokes. To compare L A T E X strings and the labeled directed trees over strokes (label graphs) used in previous CROHMEs, we convert LATEX and stroke-based label graphs to label graphs defined over symbols (symbol-level label graphs, or symLG). More than thirty (33) participants registered for the competition, with nineteen (19) teams submitting results. The strongest formula recognition results were produced by the USTC-iFLYTEK research team, for both stroke-based (81%) and image-based (77%) input. For the new typeset formula detection task, the Samsung R&D Institute Ukraine (Team 2) obtained a very strong F-score (93%). System performance has improved since the last CROHME-still, the competition results suggest that recognition of handwritten formulae remains a difficult structural pattern recognition task